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Abstract. Kernelization is a formalization of efficient preprocessing for NP-hard problems using 
the framework of parameterized complexity. Among open problems in kernelization it has been 
asked many times whether there are deterministic polynomial kernelizations for Subset Sum and 
Knapsack when parameterized by the number n of items. 

We answer both questions affirmatively by using an algorithm for compressing numbers due to 
Frank and Tardos (Combinatorica 1987). This result had been first used by Marx and Vegh (ICALP 
2013) in the context of kernelization. We further illustrate its applicability by giving polynomial 
kernels also for weighted versions of several well-studied parameterized problems. Furthermore, 
when parameterized by the different item sizes we obtain a polynomial kernelization for Subset 
Sum and an exponential kernelization for Knapsack. Finally, we also obtain kernelization results 
for polynomial integer programs. 


1 Introduction 

The question of handling numerical values is of fundamental importance in computer science. 
Typical issues are precision, numerical stability, and representation of numbers. In the present 
work we study the effect that the presence of (possibly large) numbers has on weighted versions 
of well-studied NP-hard problems. In other words, we are interested in the effect of large numbers 
on the computational complexity of solving hard combinatorial problems. Concretely, we focus 
on the effect that weights have on the preprocessing properties of the problems, and study 
this question using the notion of kernelization from parameterized complexity. Very roughly, 
kernelization studies whether there are problem parameters such that any instance of a given 
NP-hard problem can be efficiently reduced to an equivalent instance of small size in terms of 
the parameter. Intuitively, one may think of applying a set of correct simplification rules, but 
additionally one has a proven size bound for instances to which no rule applies. 

The issue of handling large weights in kernelization has been brought up again and again 
as an important open problem in kernelization mum- For example, it is well-known that 
for the task of finding a vertex cover of at most k vertices for a given unweighted graph G one 
can efficiently compute an equivalent instance ( G',k ') such that G' has at most 2k vertices. 
Unfortunately, when the vertices of G are additionally equipped with positive rational weights 
and the chosen vertex cover needs to obey some specified maximum weight W G Q then it was 
long unknown how to encode (and shrink) the vertex weights to bitsize polynomial in k. In 
this direction, Cheblfk and Cheblfkova [5] showed that an equivalent graph G' with total vertex 
weight at most 2w* can be obtained in polynomial time, whereby w* denotes the minimum 
weight of a vertex cover of G. This, however, does not mean that the size of G' is bounded, 
unless one makes the additional assumption that the vertex weights are bounded from below; 
consequently, their method only yields a kernel with that extra requirement of vertex weights 
being bounded away from zero. In contrast, we do not make such an assumption. 

Let us attempt to clarify the issue some more. The task of finding a polynomial kernelization 
for a weighted problem usually comes down to two parts: (1) Deriving reduction rules that work 
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correctly in the presence of weights. The goal, as for unweighted problems, is to reduce the 
number of relevant objects, e.g., vertices, edges, sets, etc., to polynomial in the parameter. (2) 
Shrinking or replacing the weights of remaining objects such that their encoding size becomes 
(at worst) polynomial in the parameter. The former part usually benefits from existing literature 
on kernels of unweighted problems, but regarding the latter only little progress was made. 

For a pure weight reduction question let us consider the SUBSET SUM problem. Therein we 
are given n numbers aq,. .., a n € N and a target value b € N and we have to determine whether 
some subset of the n numbers has sum exactly b. Clearly, reducing such an instance to size 
polynomial in n hinges on the ability of handling large numbers a, and b. Let us recall that a 
straightforward dynamic program solves SUBSET SUM in time 0(nb), implying that large weights 
are to be expected in hard instances. Harnik and Naor m showed that taking all numbers 
modulo a sufficiently large random prime p of magnitude about 2 2n produces an equivalent 
instance with error probability exponentially small in n. (Note that the obtained instance is 
with respect to arithmetic modulo p.) The total bitsize then becomes 0(n 2 ). Unfortunately, 
this elegant approach fails for more complicated problems than Subset Sum. 

Consider the Subset Range Sum variant of Subset Sum where we are given not a single 
target value b but instead a lower bound L and an upper bound U with the task of finding 
a subset with sum in the interval {L,... ,U}. Observe that taking the values a* modulo a 
large random prime faces the problem of specifying the new target value(s), in particular if 
U — L > p because then every remainder modulo p is possible for the solution. Nederlof et al. [22] 
circumvented this issue by creating not one but in fact a polynomial number of small instances. 
Intuitively, if a solution has value close to either L or U then the randomized approach will 
work well (possibly making a separate instance for target values close to L or U). For solutions 
sufficiently far from L or U there is no harm in losing a little precision and dividing all numbers 
by 2; then the argument iterates. Overall, because the number of iterations is bounded by the 
logarithm of the numbers (i.e., their encoding size), this creates a number of instances that is 
polynomial in the input size, with each instance having size 0(n 2 ); if the initial input is “yes” 
then at least one of the created instances is “yes” 0 

To our knowledge, the mentioned results are the only positive results that are aimed directly 
at the issue of handling large numbers in the context of kernelization. Apart from these, there 
are of course results where the chosen parameter bounds the variety of feasible weights and 
values, but this only applies to integer domains; e.g., it is easy to find a kernel for Weighted 
Vertex Cover when all weights are positive integers and the parameter is the maximum total 
weight k. On the negative side, there are a couple of lower bounds that rule out polynomial 
kernelizations for various weighted and ILP problems, see, e.g., mm ■ Note, however, that the 
lower bounds appear to “abuse” large weights in order to build gadgets for lower bound proofs 
that also include a super-polynomial number of objects as opposed to having just few objects 
with weights of super-polynomial encoding size. In other words, the known lower bounds pertain 
rather to the first step, i.e. finding reduction rules that work correctly in the presence of weights, 
than to the inherent complexity of the numbers themselves. Accordingly, since 2010 the question 
for a deterministic polynomial kernelization for Subset Sum or Knapsack with respect to the 
number of items can be found among open problems in kernelization I2J.11J216). 

Recently, Marx and Vegh [21] gave a polynomial kernelization for a weighted connectivity 
augmentation problem. As a crucial step, they use a technique of Frank and Tardos [12] . orig¬ 
inally aimed at obtaining strongly polynomial-time algorithms, to replace rational weights by 
sufficiently small and equivalent integer weights. They observe and point out that this might be 
a useful tool to handle in general the question of getting kernelizations for weighted versions of 
parameterized problems. It turns out that, more strongly, Frank and Tardos’ result can also be 
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used to settle the mentioned open problems regarding Knapsack and Subset Sum. We point 
out that this is a somewhat circular statement since Frank and Tardos had set out to, amongst 
others, improve existing algorithms for ILPs, which could be seen as very general weighted 
problems. 

Our work. We use the theorem of Frank and Tardos [12j to formally settle the open problems, 
i.e., we obtain deterministic kernelizations for Subset SUM(n) and KNAPSACK(n), in Sect. [3j 
Generally, in the spirit of Marx and Vegh’s observation, this allows to get polynomial kernel¬ 
izations whenever one is able to first reduce the number of objects, e.g., vertices or edges, to 
polynomial in the parameter. The theorem can then be used to sufficiently shrink the weights 
of all objects such that the total size becomes polynomial in the parameter. 

Motivated by this, we consider weighted versions of several well-studied parameterized prob¬ 
lems, e.g., d-HlTTlNG Set, gLSet Packing, and Max Cut, and show how to reduce the number 
of relevant structures to polynomial in the parameter. An application of Frank and Tardos’ re¬ 
sult then implies polynomial kernelizations. We present our small kernels for weighted problems 
in Sect. [U 

Next, we consider the Knapsack problem and its special case Subset Sum, in Sect. [5j 
For Subset Sum instances with only k item sizes, we derive a kernel of size polynomial in k. 
This way, we are improving the exponential-size kernel for this problem due to Fellows et 
al. pTO]. We also extend the work of Fellows et al. in another direction by showing that the more 
general Knapsack problem is fixed-parameter tractable (i.e. has an exponential kernel) when 
parameterized by the number k of item sizes, even for unbounded number of item values. On the 
other hand, we provide quadratic kernel size lower bounds for general Subset Sum instances 
assuming the Exponential Time Hypothesis fT6|. 

Finally, as a possible tool for future kernelization results we show that the weight reduction 
approach also carries over to polynomial ILPs so long as the maximum degree and the domains 
of variables are sufficiently small, in Sect. El 

2 Preliminaries 

A parameterized problem is a language 77 C U* x N, where A is a finite alphabet; the second 
component k of instances (I, k ) € U* x N is called the parameter. A problem 77 C S* x N is fixed- 
parameter tractable if it admits a fixed-parameter algorithm , which decides instances (7, k) of 77 
in time f(k) ■ IhpW for some computable function /. The class of fixed-parameter tractable 
problems is denoted by FPT. Evidence that a problem 77 is unlikely to be fixed-parameter 
tractable is that 77 is W[i]-hard for some t £ N or W[P]-hard, where FPT C W[1] C W[2] C 
... C W[P]. To prove hardness of 77, one can give a parameterized reduction from a W[-]-hard 
problem U' to 77 that maps every instance V of IT with parameter k' to an instance 7 of 77 
with parameter k < g(k') for some computable function g such that 7 can be computed in time 
f(k') ■ \I' for some computable function /, and 7 is a “yes”-instance if and only if 7 ' is. 
If / and g are polynomials, such a reduction is called a polynomial parameter transformation. 
A problem 77 that is NP-complete even if the parameter k is constant is said to be para-NP- 
complete. 

A kernelization for a parameterized problem 77 is an efficient algorithm that given any 
instance (7,7’) returns an instance ( I',k') such that (7,7) € 77 if and only if ( I',k') € 77 and 
such that \I'\ + k' < f(k ) for some computable function /. The function / is called the size 
of the kernelization, and we have a polynomial kernelization if f{k) is polynomially bounded 
in k. It is known that a parameterized problem is fixed-parameter tractable if and only if it 
is decidable and has a kernelization. Nevertheless, the kernels implied by this fact are usually 
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of superpolynomial size. (The size matches the f(k) from the run time, which for NP-hard 
problems is usually exponential as typical parameters are upper bounded by the instance size.) 
On the other hand, assuming FPT W[l] no W[l]-hard problem has a kernelization. Further, 
there are tools for ruling out polynomial kernels for some parameterized problems |8)1| under 
an appropriate complexity assumption (namely that NP ^ coNP/poly). Such lower bounds can 
be transferred by the mentioned polynomial parameter transformations [3]. 

3 Settling Open Problems via the Frank-Tardos Theorem 
3.1 Frank and Tardos’ theorem 

Frank and Tardos [T2j describe an algorithm which proves the following theorem. 

Theorem 1 ( [1~ 2 | ). There is an algorithm that, given a vector w £ Q r and an integer N, in 
polynomial time finds a vector w £ Z r with ||u;|| < 2 4r3 _/\M r + 2 ) such that sign(w-b) = sign (w-b) 
for all vectors h £ 17 with 11611 x < N — 1. 

This theorem allows us to compress linear inequalities to an encoding length which is polyno¬ 
mial in the number of variables. Frank and Tardos’ algorithm runs even in strongly polynomial 
time. As a consequence, all kernelizations presented in this work also have a strongly polynomial 
running time. 

Example 1. There is an algorithm that, given a vector w £ Q r and a rational W £ Q, in 
polynomial time finds a vector w £ Z r with ||itj|| = 2°( r l and an integer W £ Z with total 

encoding length 0(r 4 ), such that w-x < W if and only if w-x < W for every vector x £ {0, l} r . 

Proof. Use Theorem Q] on the vector (■w , W) £ Q r+1 with N = r + 2 to obtain the resulting 
vector (w, W). Now let b = (x, —1) £ Z r+1 and note that < N— 1. The inequality w-x < W 
is false if and only if sign(rc • x — W) = sign((rc, W) ■ (x, —1)) = sign((rc, W) ■ b) is equal to +1. 
The same holds for w ■ x < W. 

As each \wi\ can be encoded with 0(r 3 + r 2 logN ) = 0(r 3 ) bits, the whole vector w has 
encoding length 0(r 4 ). □ 


3.2 Polynomial Kernelization for Knapsack 

A first easy application of Theorem [Tj is the kernelization of Knapsack with the number n of 
different items as parameter. 


Knapsack (n) 

Input: An integer n £ N, rationals W, P £ Q, a weight vector w £ Q n , and a 
profit vector p £ Q n . 

Parameter: n. 

Question: Is there a vector x £ {0, l} n with w ■ x < W and p ■ x > PI 


Theorem 2. Knapsack(?i) admits a kernel of size 0(n 4 ). □ 

As a consequence, also Subset SUM(n) admits a kernel of size 0(n 4 ). 


4 



4 Small Kernels for Weighted Parameterized Problems 

The result of Frank and Tardos implies that we can easily handle large weights or numbers in 
kernelization provided that the number of different objects is already sufficiently small (e.g., 
polynomial in the parameter). In the present section we show how to handle the first step, i.e., 
the reduction of the number of objects, in the presence of weights for a couple of standard 
problems. Presumably the reduction in size of numbers is not useful for this first part since the 
number of different values is still exponential. 

4.1 Hitting Set and Set Packing 

In this section we outline how to obtain polynomial kernelizations for Weighted 4-Hitting 
Set and Weighted 4-Set Packing. Since these problems generalize quite a few interesting 
hitting/covering and packing problems, this extends the list of problems whose weighted versions 
directly benefit from our results. The problems are formally defined as follows. 

Weighted 4-Hitting Set(A;) 

Input: A set family IF C a function w: U -A N, and k,W G N. 

Parameter: k. 

Question: Is there a set S C U of cardinality at most k and weight w(u) < W such 

that S intersects every set in FI 


Weighted 4-Set Packing(/c) 

Input: A set family F C a function w: F —> N, and k,W £N. 

Parameter: k. 

Question: Is there a family F* C F of exactly k disjoint sets of weight YIfgt* w (F) — JR? 

Note that we treat 4 as a constant. We point out that the definition of Weighted Set 
Packing(/c) restricts attention to exactly k disjoint sets of weight at least W. If we were to 
relax to at least k sets then the problem would be NP-hard already for k = 0. On the other hand, 
the kernelization that we present for Weighted Set Packing(/c) holds also if we require F* 
to be of cardinality at most k (and total weight at least W, as before). 

Both kernelizations rely on the Sunflower Lemma of Erdos and Rado [9], same as their 
unweighted counterparts. We recall the lemma. 

Lemma 1 (Erdos and Rado [9]b Let F be a family of sets, each of size d, and let k G N. If 
\F\ > dlk d then we can find in time 0(\F\) a so-called k + 1 -sunflower, consisting of k + 1 sets 
Ei,..., Efc+i G F such that the pairwise intersection of any two Fi , Fj with i 7 ^ j is the same 
set C, called the core. 

For Weighted 4-Hitting Set(E) we can apply the Sunflower Lemma directly, same as 
for the unweighted case: Say we are given ( U,F,w,k,W). If the size of F exceeds d\(k + l) d 
then we find a {k + 2 )-sunflower F s in F with core C. Any hitting set of cardinality at most k 
must contain an element of C . The same is true for k + 1-sunflowers so we may safely delete 
any set F G F s since hitting the set C C F is enforced by the remaining k + 1-sunflower. 
Iterating this reduction rule yields F' C F with \F'\ = 0(k d ) and such that (U, F, w, k, W) and 
(U,F f ,w,k,W) are equivalent. 

Now, we can apply Theorem |T] We can safely restrict U to the elements U' present in sets of 
the obtained set family F' , and let w' = w\u'- By Theorem Q] applied to weights w' and target 
weight W with N = k +2 and r = 0(k d ) we get replacement weights of magnitude bounded by 
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2 o{k™) N o(k™) and bit 

size 0(k 3d ). Note that this preserves, in particular, whether the sum of 
any k weights is at most the target weight W, by preserving the sign of w- n +... + W{ k — W. The 
total bitsize is dominated by the space for encoding the weight of all elements of the set U'. 

Theorem 3. Weighted d-HiTTlNG Set(&) admits a kernelization to 0(k d ) sets and total 
size bounded by 0{k 4d ). 

For Weighted gLSet Packing( fc) a similar argument works. 

Theorem 4. Weighted d-SET Packing(F) admits a kernelization to 0{k d ) sets and total 
size bounded by 0(k 4d ). 

Proof. If the size of J- exceeds dl(dk) ci then we find adH 1-sunflower J- s in T with core C. 
We argue that we can safely discard the set Fq £ F s of least weight according to w : T —> N: 
This could only fail if there is a solution that includes Fq, namely k disjoint sets Fq ,... , Ffc_i 
of total weight at least W. Notice that no set Fj,..., F^_\ can contain C, since C C Fq. Since 
| F s \ = dk + 1 there must be another set Ff,,, apart from Fq, that has an empty intersection with 
F\,, Fk- 1, as the sets in F s are disjoint apart from C and there are in total d(k — 1) elements 
in F\,..., Ffc_i . It follows that Fj ,..., Fj, is also a selection of k disjoint sets. Since Fo is the 
lightest set in F s we must have that the total weight of Fj..... Fj, is at least W. 

Iterating this rule gives |Fj = 0{k d ). Again, it suffices to preserve how the sum of any k 
weights compares with W. Thus, we get the same bound of 0(k 3d ) bits per element (of T , in 
this case). □ 

4.2 Max Cut 

Let us derive a polynomial kernel for Weighted Max Cut(IU), which is defined as follows. 
Weighted Max Cut(W) 

Input: A graph G, a function w : E —> Q>i, and W £ Q>i- 
Parameter: \W~\. 

Question: Is there a set C C V(G) such that YleeS(C) w ( e ) — 

Note that we chose the weight of the resulting cut as parameter, which is most natural for 
this problem. The number k of edges in a solution is not a meaningful parameter: If we restricted 
the cut to have at least k edges, the problem would again be already NP-hard for k = 0. If 
we required at most k edges, we could, in this example for integral weights, multiply all edge 
weights by n 2 and add arbitrary edges with weight 1 to our input graph. When setting the new 
weight bound to n 2 • W + Q), we would not change the instance semantically but there may be 
no feasible solution left with at most k edges. 

The restriction to edge weights at least 1 is necessary as otherwise the problem becomes 
intractable. This is because when allowing arbitrary positive rational weights, we can transform 
instances of the NP-complete Unweighted Max Cut problem (with all weights equal to 1 
and parameter k, which is the number of edges in the cut) to instances of the Weighted Max 
Cut problem on the same graph with edge weights all equal to 1/k and parameter W = 1. 

Theorem 5. Weighted Max Cut(IU) admits a kernel of size 0(W 4 ). 

Proof. Let T be the total weight of all edges. If T > 2W, then the greedy algorithm yields a 
cut of weight at least T /2 > W. Therefore, all instances with T > 2 W can be reduced to a 
constant-size positive instance. Otherwise, there are at most 2W edges in the input graph as 
every edge has weight at least 1. Thus, we can use Theorem [T] to obtain an equivalent (integral) 
instance of encoding length 0(W 4 ). □ 
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4.3 Polynomial Kernelization for Bin Packing with Additive Error 

Bin Packing is another classical NP-hard problem involving numbers. Therein we are given n 
positive integer numbers a±,... ,a n (the items), a bin size b £ N, and an integer k\ the question 
is whether the integer numbers can be partitioned into at most k sets, the bins, each of sum 
at most b. From a parameterized perspective the problem is highly intractable for its natural 
parameter k, because for k = 2 it generalizes the (weakly) NP-hard Partition problem. 

Jansen et al. OH proved that the parameterized complexity improves drastically if instead 
of insisting on exact solutions the algorithm only has to provide a packing into k + 1 bins or 
correctly state that k bins do not suffice. Concretely, it is shown that this problem variant is 
fixed-parameter tractable with respect to k. The crucial effect of the relaxation is that small 
items are of almost no importance: If they cannot be added greedily “on top” of a feasible 
packing of big items into k + 1 bins, then the instance trivially has no packing into k bins due 
to exceeding total weight kb. Revisiting this idea, with a slightly different threshold for being 
a small item, we note that after checking for total weight being at most kb (else reporting that 
there is no /c-packing) we can safely discard all small items before proceeding. Crucially, this 
cannot turn a no- into a yes-instance because the created k + 1 -packing could then also be lifted 
to one for all items (contradicting the assumed no-instance). An application of Theorem [T] then 
yields a polynomial kernelization because we can have only few large items. 


Additive One Bin Packing (k) 

Input: Item sizes a\,...,a n £ N, a bin size b £ N, and k £ N. 

Parameter: k. 

Task: Give a packing into at most k + 1 bins of size b, or correctly state that k bins 
do not suffice. 


Theorem 6. Additive One Bin Packing(&) admits a polynomial kernelization to 0{k 2 ) 
items and bit size (D(k 3 ). 


Proof. Let an instance (ai,... ,a n ,b,k ) be given. If any item size a* exceeds b , or if the total 
weight of items a* exceeds k-b, then we may safely answer that no packing into k bins is possible. 
In all other cases the kernelization will return an instance whose answer will be correct for the 
original instance: if it reports a (k + l)-packing then the original instance has a (k + l)-packing. 
If it reports that no £:-packing is possible then the same holds for the original instance. 

Assume that the items a* are sorted decreasingly by value. Consider the subsequence, say, 
ai,..., a?, of items of size at least If the instance restricted to these items permits a packing 
into at most k + 1 bins, then we show that the items a^+i,..., a n can always be added, giving a 
[k + l)-packing for the input instance: assume that a greedy packing of the small items into the 
existing packing for ai, ... ,a£ fails. This implies that some item, say a*, of size less than 
does not fit. But then all bins have less than yryy remaining space. It follows that the total 
packed weight, excluding ai, is more than 

+ 1) • (ft - = (k + l)b-b = kb . 

This contradicts the fact that this part of the kernelization is only run if the total weight is at 
most kb. Thus, a k + 1-packing for a\ ,..., ai implies a k + 1-packing for the whole set ai,..., a n . 

Clearly, if the items oi,... ,ai permit no packing into k bins then the same is true for the 
whole set of items. 
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Observe now that £ cannot be too large: Indeed, since the total weight is at most kb (else 
we returned “no” directly), there can be at most 

jr = H k + 1) 

fc+i 

items of weight at least jpj- Thus, an application of the weight reduction tools yields a total 
size of 0(k 3 ). □ 

5 Kernel Bounds for Knapsack Problems 

In this section we provide lower and upper bounds for kernel sizes for variants of the Knapsack 
problem. 

5.1 Exponential Kernel for Knapsack with Few Item Sizes 

First, consider the Subset Sum problem restricted to instances with only k distinct item 
weights, which are not restricted in any other way (except for being non-negative integers). 
Then the problem can be solved by a fixed-parameter algorithm for parameter A; by a reduction 
to integer linear programming in fixed dimension, and applying Lenstra’s algorithm [20] or one 
of its improvements mm- This was first observed by Fellows et al. [10] . 

We now generalize the results by Fellows et al. [10] to Knapsack with few item weights. 
More precisely, we are given an instance I of the Knapsack problem consisting of n items that 
have only k distinct item weights; however, the number of item values is unbounded. This means 
in particular, that the “number of numbers” is not bounded as a function of the parameter, 
making the results by Fellows et al. [10] inapplicable. 

Theorem 7. The Knapsack problem with k distinct weights can be solved in time fc 2 - 5fc +°( fc ) . 
poly(|I|), where |I| denotes the encoding length of the instance. 

Proof. Observe that when packing Xi items of weight Wi, it is optimal to pack the Xi items 
with largest value among all items of weight Wi. Assume the items of weight Wi are labeled as 
j[‘\ ... ,jn,') by non-increasing values. For each s € N, define f%(s) := ]F| =1 v (jP)> where v(j^) 
denotes the value of item j) . We can formulate the knapsack problem as the following program, 
in which variable Xi encodes how many items of weight Wi are packed into the knapsack and gi 
encodes their total value: 

k k 

max E gi s.t. • Xi < W, 

i=l i=l 

9i < fi(xi), i = l,...,k, 

Xi € {0,1,... , Hi}, gi € N 0 i = l,...,k. 

The functions fi are in general non-linear. Their concavity implies the following lemma. 

Lemma 2. For each i there exists a set of linear functions p^\...,such that fi(s) = 
min^p^(s) for every s 6 {0,..., n*}. 

Proof. For each i G {1,..., n^} we define pp (s) to be the unique linear function such that 





The function fi(s) is concave because 

m +1) - m = wtfSi) < v{jf) = m - m -1) 

for each £ £ {1,... ,rii — 1 }. Therefore, the definition of the linear functions pp implies that 
fi(s) < pf\s ) for every £ £ { 1,..., n*} and s £ {0,..., n,}. Since for each s £ {1,... , n*} we 
have that p[ s \s) = /i(s) and p^( 0 ) = /*( 0 ), we conclude that fi(s) = max^p^(s) for every 
s £ { 0 ,... ,n;}. □ 

Hence in the program above, we can, for every i £ {1, ..., k}, replace the constraint g^ < 
fi(xi) by the set of constraints gi < pf\xi ) for £ £ {1 ,m}. This way we obtain a formulation 
of the knapsack problem as an integer linear program with k variables. The encoding length 
of this integer linear program is polynomially bounded in the encoding length of the instance 
of Knapsack. Together with the algorithm by Kannan [18] this implies the fixed-parameter 
tractability of Knapsack with k item weights. Using the improved version of this algorithm by 
Frank and Tardos [12], the theorem follows. □ 


5.2 Polynomial Kernel for Subset Sum with Few Item Sizes 

We now improve the work of Fellows et al. [10] in another direction. Namely, we show that the 
Subset Sum problem admits a polynomial kernel for parameter the number k of item sizes; 
this improves upon the exponential-size kernel due to Fellows et al. [10] . To show the kernel 
bound of k consider an instance / of Subset Sum with n items that have only k distinct 
item sizes. For each item size Sj, let pi be its multiplicity, that is, the number of items in I of 
size Si. Given I, we formulate an ILP for the task of deciding whether some subset S of items 
has weight exactly t. The ILP simply models for each item size s* the number of items Xi < pi 
selected from it as to satisfy the subset sum constraint: 


slXl + . . . + SfcXfc — t, 

0 < Xi < m, i = 1,..., k, > 
Xi € N 0 , i = 1,... ,k . j 


( 1 ) 


Then ([TJ) is an Integer Linear Programming instance on m = 1 relevant constraint and 
each variable x, has maximum range bound u = maxi /r,- < n. 

Now consider two cases: 


— If log n < k ■ log k, then we apply Theorem [T] to (]X|) to reduce the instance to an equivalent 
instance I' of size 0(Ar-|-A : 3 logn) = 0(k i +k 3 -(k log k)) = 0(k A log k). We can reformulate I' 
as an equivalent Subset Sum instance by replacing each size s, by Oiloggi) new weights 

• Si for 0 < j < £i and 2 J • s u where £i is the largest integer such that 

JT’ = o 2 J < pi. Then we have O(klogn) = 0(k 2 log k) items each with a weight which can 
be encoded in length 0{k 3 + k 2 log n + log n) = 0(k 3 log k), resulting in an encoding length 
of 0{k 5 log 2 k). 

— If klogk < logn, then we solve the integer linear program © by the improved version of 
Kannan’s algorithm [18] due to Frank and Tardos m that runs in time d 2 - 5d +°( d ) . s for 
integer linear programs of dimension d and encoding size s. As © has dimension d = k and 
encoding size s = |J|, the condition k k < n means that we can solve the ILP (and hence 
decide the instance I) in time J t 2 - 5k +°l k ) . s = n °^ l \ 

In summary, we have shown the following: 
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Theorem 8. Subset Sum with k item sizes admits a kernel of size 0(k 5 log 2 k). Moreover, 
it admits a kernel of size 0(k 4 log k) if the multiplicities of the item weights can be encoded in 
binary. 

We remark that this method does not work if the instance I is succinctly encoded by specifying 
the k distinct item weights Wi in binary and for each item size Sj its multiplicity /a l in binary: 
then the running time of Frank and Tardos’ algorithm can be exponential in k and the input 
length of the subset sum instance, which is 0{k • logn). 


5.3 A Kernelization Lower Bound for Subset Sum 

In the following we show a kernelization lower bound for Subset Sum assuming the Exponential 
Time Hypothesis. The Exponential Time Hypothesis m states that there does not exist a 2 
time algorithm for 3-SAT, where n denotes the number of variables. 

Lemma 3. Subset Sum does not admit a 2°^ -time algorithm assuming the Exponential Time 
Hypothesis, where n denotes the number of numbers. 

Proof. The proof is based on a polynomial-time reduction by Gurari m that transforms any 3- 
SAT formula <f> with n variables v\,... ,v n and m clauses C±,..., C m into an equivalent instance 
of Subset Sum with exactly 2 n + 2m numbers. 

For j £ {1,..., m}, let clause Cj = {cj\\Zcj 2 Vcj 3 ), where Cji,Cj 2 ,Cj 3 £ +i, ~>vi, ■.. ,v n , ->v n }. 
As an intermediate step in the reduction, we consider the following system of linear equations 
in which we interpret v t and —<Vi as variables and introduce additional variables yj and y'- for 
every j € {1 ,... ,m}: 

Vz £ {1,... ,n} : Vi + ->Vi = 1, 

Vj £ {1, ■ ■ ■, m} : cji + c j2 + c j3 + yj + y' = 3. 

It can easily be checked that this system of linear equations has a solution over {0,1} if and 
only if the formula is satisfiable. Relabeling the variables yields a reformulation of © as 


( 01,1 ^ 

Z\ + . 

■+| 

i 

^277+2777 — 

f Cl ^ 

\ ^n+m, 1 / 



^ ^71+772,271+2771 / 


\ Cn+m / 


where a,;j £ {0,1} and c % £ {1, 3}. We can rewrite this system of equations as the single equation 

Q-l^l T • • • T Q-2n+2m^2n+2m = C, (4) 

where each aj £ N is the integer with decimal representation a±j ... a n+m j and C £ N denotes 
the integer with decimal representation ci ... c n+m . Equation © is equivalent to the system ©, 
because the sum a^i-K . ,+a^ 2n + 2m is at most hve. This ensures that no carryovers occur and the 
h- th digit of the sum ai-zi-K.. + a 2 n+ 2 m^ 2 n+ 2 m is equal to the sum a h)1 zi + .. .+a h)2n+2m z 2n+2m . 
It follows that © is satisfiable over {0,1} if and only if © is satisfiable over {0,1}. 

As a result, the 3-SAT formula <fi is satisfiable if and only if the tuple (ai,... , a 2m+2n , C ) is 
a “yes”-instance for Subset Sum. Now assume there is an algorithm for Subset Sum that runs 
in time 2°^\ where l denotes the number of numbers. With the reduction above we could use 
this algorithm to decide whether or not (j) is satisfiable in time 2°l n+m ). Due to the sparsification 
lemma of Impagliazzo et al. (16| . this contradicts the Exponential Time Hypothesis. □ 
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Theorem 9. Subset Sum does not admit kernels of size 0(n 2 £ ) for any e > 0 assuming the 
Exponential Time Hypothesis, where n denotes the number of numbers. 

Proof. Assume there exists a kernelization algorithm A for Subset Sum that produces instances 
of size at most ku 2 ~ £ for some k > 0 and some e > 0. We show that A can be utilized to solve 
Subset Sum in time 2°l Tl ', which contradicts the Exponential Time Hypothesis due to Lemma[3j 
Let I be an arbitrary Subset Sum instance with n items. We apply the kernelization algo¬ 
rithm A to obtain an equivalent instance I' whose encoding size is at most nn 2 ~ e . Let a \,..., a m 
be the numbers in I' and let c be the target value. 

Let k = n 1-£ / 2 . We divide the numbers in I' into two groups: a number a* is called heavy 
if at > 2 k and light otherwise. Since one needs at least k bits to encode a heavy number, the 
number of heavy numbers is bounded from above by nn 2 ~ £ /k = «:n 1_£//2 . 

We solve instance I' as follows: for each subset Jh of heavy numbers, we determine whether 
or not there exists a subset Jl of light numbers such that ^2 i£ j LU j H a,i = c via dynamic program¬ 
ming. Since there are at most nn l ~ £ ^ 2 heavy numbers, there are at most 2 Knl e/2 subsets Jh- 
The dynamic programming algorithm runs in time 0(m 2 ■ 2 nd f/2 ), as each of the at most m 
light numbers is bounded from above by 2 nl e “. Hence, instance I' can be solved in time 
0(m 2 ■ 2( 1+K ) nl e/2 ) = 2°( n ), where the equation follows because m < nn 2 ~ £ = 2°( n \ □ 

6 Integer Polynomial Programming with Bounded Range 

Up to now, we used Frank and Tardos’ result only for linear inequalities with mostly binary 
variables. But it also turns out to be useful for more general cases, namely for polynomial 
inequalities with integral bounded variables. We use this to show that Integer Polynomial 
Programming instances can be compressed if the variables are bounded. As a special case, 
Integer Linear Programming admits a polynomial kernel in the number of variables if the 
variables are bounded. 

Let us first transfer the language of Theorem [T] to arbitrary polynomials. 

Lemma 4. Let f £ Q[A'i,... ,X n ] be a polynomial of degree at most d with r non-zero coef¬ 
ficients, and let u £ N. Then one can efficiently compute a polynomial f £ Z[X\,... ,X n ] of 
encoding length 0(r 4 + r 3 dlog(ru) + rdlog(nd)) such that sign(f (x) — f (y)) = sign(f (x) — f (y )) 
for all x, y £ {— u ,..., u} n . 

Proof. Let w \,..., w r £ Q and fi, ■ ■ ■, f r £ Q[-Xi,... , X n \ be pairwise distinct monomials with 
coefficient 1 such that / = YH=i w i' fi- Apply Theorem Q] to w = (uq ,... ,w r ) and N = 2ru d +1 
to obtain w = (uq,..., w r ) £ Z r . Set / = J^i=l ' fi- 

The encoding length of each W{ is upper bounded by 0(r 3 + r 2 log N) = 0(r 3 + r 2 ■ d- log(r • 
u)). As there are ( n ^ d ) monomials of degree at most d, the information to which monomial a 
coefficient belongs can be encoded in O (log((n + d) d )) = 0{d\og{nd)) bits. Hence, the encoding 
length of / is upper bounded by 

0(?’ 4 + r 3 d\og{ru) + rdlog(nd)) . 

To prove the correctness of our construction, let x,y £ {— u,... ,u} n . For 1 < i < r, set 
bi = fi(x) — fi(y) 6Zfl [—2 u d , 2 u d ], and set b = (b \,..., b r ). Then || fe|| x < r • 2 u d , and thus by 
Theorem [TJ sign(u; • b) = sign(u> • b). Then also sign (f(x) — f(y)) = sign(/(x) — f(y)), as 

r r 

/O) - f{y) = ( fi(x ) - fi{y)) = 'Y^w i -bi = w -b, 

i= 1 2—1 
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and 


r r 

f{x) - fiv) = ^2 Wi • (/i( x) - fi(y)) = 'Y^w i -b i = w -b. 

i= 1 i =1 

This completes the proof of the lemma. □ 

We use this lemma to compress Integer Polynomial Programming instances. 


Integer Polynomial Programming 

Input: Polynomials c,gi,... ,g m £ Q[Xi,..., X n ] of degree at most d encoded by the 
coefficients of the 0(n d ) monomials, rationals &i,..., b m , z £ Q, and u £ N. 
Question: Is there a vector x £ {—u ,..., u] n with c(x) < z and gfx) < bi for i = 1 ,..., ml 


Theorem 10. Every Integer Polynomial Programming instance in which c and each gi 
consist of at most r monomials can be efficiently compressed to an equivalent instance with an 
encoding length that is bounded by 0(m(r 4 + r 3 dlog(ru) + rdlog(nd))). 

Proof. Define d, g [,..., g' m : Z n x {0,1} —>• Q as 

c'(x,y) := c{x) + y ■ z, 

9i(x,y) := g'i(x)+y-bi i=l,...,m. 

Now apply Lemma 0 to c' and g [,... ,g' m to obtain c! and g\., g ' m . Thereafter, split these 
functions up into their parts (c,z) and (< 71 , b\ ),..., (g m , b m ). We claim that the instance I = 
(c, gi ,..., g m ,d , 61 ,..., b m . z, u) is equivalent to I. To see this, we have to show that a vector 
x £ {—u ,..., u} n satisfies c(x) < z if and only if it satisfies c(x) < z (and analogously g%(x) < bi 
if and only if gfx) < bi for all i). This follows from 

sign(c(x) — z) = sign(c , (x, 0 ) — </( 0 , 1 )) 

= sign(c / (x, 0 ) — c'( 0 , 1 )) 

= sign(c(x) — z), 


where equality (a) follows from Lemma 0 

It remains to show the upper bound on the encoding length of I'. Each of the tuples (c,z), 
(gi, 61 ),..., ( g m , b m ) can be encoded with 

0(r 4 + r 3 dlog(ru) + rdlog(nd)) 

bits. The variables d and u can be encoded with Ofogd + log u) bits. In total, this yields the 
desired upper bound on the kernel size. □ 

This way, Theorem [TO] extends an earlier result by Granot and Skorin-Karpov [13] who 
considered the restricted variant of d = 2. 

As r is bounded from above by 0((n + d) d ), Theorem [TU] yields a polynomial kernel for the 
combined parameter (n, m, u ) for constant dimensions d. In particular, Theorem [TU] provides 
a polynomial kernel for Integer Linear Programming for combined parameter ( n,m,u ). 
This provides a sharp contrast to the result by Kratsch uni that Integer Linear Program¬ 
ming does not admit a polynomial kernel for combined parameter (n, m ) unless the polynomial 
hierarchy collapses to the third level. 
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7 Conclusion 


In this paper we obtained polynomial kernels for the Knapsack problem parameterized by the 
number of items. We further provide polynomial kernels for weighted versions of a number of 
fundamental combinatorial optimization problems, as well as integer polynomial programs with 
bounded range. Our small kernels are built on a seminal result by Frank and Tardos about 
compressing large integer weights to smaller ones. Therefore, a natural research direction to 
pursue is to improve the compression quality provided by the Frank-Tardos algorithm. 

For the weighted problems we considered here, we obtained polynomial kernels whose sizes 
are generally larger by some degrees than the best known kernel sizes for the unweighted coun¬ 
terparts of these problems. It would be interesting to know whether this increase in kernel size 
as compared to unweighted problems is actually necessary (say it could be that we need more 
space for objects but also due to space for encoding the weights), or whether the kernel sizes of 
the unweighted problems can be matched. 
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